Each segment is connected to the next one, so no worries if some concept is written and is not explained in details.

The lectures were recited by Pavel Sinitcyn for special module on proteomics, May, 2019, at ITMO for students of Bioinformatics and Systems Biology.

Author: Mrinal Vashisth, MS Bioinformatics and Systems Biology, mrinalmanu10@gmail.com

Please consider expanding sections, and correcting errors.

Proteomics

All proteins of the cell constitute the proteome. It is highly diverse and dynamic, which also includes not only protein-protein but interactions with other molecules. Proteome also includes post-translational modifications etc.

proteomics.png

One major difference between proteomics and genomics is abscence of amplification part i.e. PCR cannot be done on protein molecules. This limits us to the amount of sample.

For proteomics we use Mass Spectrometry, often in tandem called MS-MS, i.e. two subsequent MS's are performed.

MS-MS is used to produce structural information about a compound by fragmenting specific sample ions inside the mass spectrometer and identifying the resulting fragment ions.

This information can then be pieced together to generate structural information regarding the intact molecule. Tandem mass spectrometry also enables specific compounds to be detected in complex mixtures on account of their specific and characteristic fragmentation patterns.

Mass Spectrometry

There happen to be many ion capturing techniques in MS, but we will discuss MS1 (first MS of MS-MS) with Orbitrap based. The idea is simple.

Proteomic instrument assembly contains three parts, Sample preparation part, which feeds to MS1 which feeds to MS2.

instrument.png

In this image we can see that sample preparation part of the machine (smaller) is feeding to a larger MS-MS machine. At the heart of MS-MS machine is a cyclotron device such as Orbitrap.

Check out this video: https://www.youtube.com/watch?v=K1VSYjuw6os

There are three parts to an MS experiment.

1.) Sample preparation for MS-MS: Ionisation is a general way of doing sample preparation in MS or gas based chromatography experiments (for inorganic molecules). The major difference is that proteins are organic and can be destroyed if we try to heat ionise them. A special technique called MALDI (Matrix based Laser Assisted Desorption and Ionisation)is thus used.

Maldi.jpg

Bascically, a laser runs over a tissue sample scanning the surface and bouncing off excited biomolecules. A benefit of this technique is that we can localise the spectrum i.e. we know exactly from what part of the tissue we get the proteins.

After we get an ionised sample, we feed it to MS-MS machine.

2.) MS-MS: There are two parts to MS-MS, MS1 and MS2. MS1 is non-specific, we just take the whole sample and small amounts of it are fed to the orbitrap/ any ion capturing equipment. This generates a signal graph (mass to charge with respect to retention time), we can deconvolute this signal and find individual masses of the ions fed into the ion capture equipment (orbitrap/ ion cyclotrons).

TOF.png

In MS2, we focus on specific molecules, we use mass/charge winnowing to select only specific peptides, these peptides are broken down into smaller pieces for finer resolution. MS2 can be modified cleverly to study to verify/ study protein structures.

It is important to remember that in MS1 we are getting protein spectra, while in MS2 we focus on finer details of the proteins of interest.

Check out this video: https://www.youtube.com/watch?v=ESKpOcjF8QM

image.png

(https://www.sciencedirect.com/science/article/pii/S1387380617303470)

3.) Data analyses and interpretation

The data generated from MS-MS experiment is in the form of peaks. Each measurement in MS1 for example is done for 10 ms. We have a spectrum of different mass-to-charge ratios. We can compare these ratios to a pre-existing database, or go de-novo (using machine learning models) to predict peptides.

The intensity of peaks represent expression level of a given protein in a sample. So if the peaks of controls are three times higher than cases (condition), we can say that expression is decreased in condition by 1/3.

Check out this video: https://www.youtube.com/watch?v=x8DyTXHafd4

Although this video is about PEAKS, the idea is the same as MaxQuant.

dec_ms-ms.png

This is how these peaks look in 3D.

peaks.png

peaks_2.png

To differentiate between two conditions we label the peaks. If we don't label, it is called label-free (used in data independent acquisition).

peaks_labelling.png

In this image we can see the different methods of labelling. The basic idea is to make one of the samples containing all Arginines heavier than the natural Arginine by growing on Arginine heavy diet (Arginine containing more neutrons than usual). Thus either the case or control contains all Arginine with 10 neutrons, and each peptide peak will be separated with the normal peak by a distance of 5 units.

Chemical based labelling is rarely used, unless there is a good explanation. The reason is that if we use chemicals, it may interfere with peptides. In one experiment GFP (Green fluroscent peptide) was used to label, the GFP was added from N and C terminus of the peptide in two different experiments to show that the result was the same. This means that GFP can be safely sued as a chemical label without biasing the experiment.

The idea is that, the charge to mass ratio detection in orbitrap is super-sensitive. To the degree that if we added or removed one neutron the equipment will register it as a separate peak. As you can see in this image.

peaks_3.png

Proteomics experiments

There are two major areas of proteomics: 1.) Identification and quantification of peptides, proteins, and PTM (post translational modifications) 2.) Downstream analysis: here the main focus is biological interpretation of the quantitative results.

Furthermore, proteomics can be bottom-up or top-down.

top_down_bottom_up.png

Top-down: Here we start with the whole protein and study them in MS. For this we do feature deconvolution (separation of signal into fundamental components e.g. a signal can be seen as a superposition of multiple sine waves).

Bottom-up: We start from the peptides obtained by digestion of proteins from digestive enzymes. This is also called shotgun approach, since we are blowing up the proteins into shorter fragments.

In Bottom-up proteomics data-dependent acquisition is done.

Data acquisition modes in MS-MS experiments

acquisition_modes.png

In this image we can see three types of acquisition methods of data in MS-MS result.

Each image is a end result of a MS-MS experiment. The red boxes represent the selected regions. We can see lines in the images, these are systematic errors arising from impurities, we don't need to worry for these.

In data-dependent acquisition the algorithm selects based on pre-exisiting values for retention time versus m/z ratio. The problem is that we are dependent on a database, which can make us miss novel features in our data.

The opposite end of this spectrum is to take a window of fixed length and width to scan the entire image. This is computationally heavy process, but the benefit is that we are not dependent on databases or in other base, we have a data-independent acquisition of features.

Finally, the third approach is target. Here, depending on our experiment type, we know target specific retention time versus m/z ratio features. This is context-based, fast, and experiment specific approach. Particularly useful when we have a large number of samples and specific proteins of interest.

A standard proteomics experiment

MS_MS.png

a. MS1: The protein under investigation would be analysed by mass spectrometry to generate a molecular mass to within an accuracy of 0.01%.

b. MS2: The protein would then be digested with a suitable enzyme. Trypsin is useful for mass spectrometric studies because each proteolytic fragment contains a basic arginine (R) or lysine (K) amino acid residue, and thus is specially suitable for positive ionisation mass spectrometric analysis.

orbitrap.jpg

The digest mixture is analysed - without prior separation or clean-up - by mass spectrometry to produce a complex spectrum from which the molecular weights of all of the proteolytic fragments can be read.

This spectrum, with its molecular weight information, is called a peptide map. (If the protein already exists on a database, then the peptide map is often sufficient to confirm the protein.)

c. With the digest mixture still spraying into the mass spectrometer, the Q-Tof mass spectrometer is switched into "MS/MS" mode.

The protonated molecular ions of each of the digest fragments can be independently selected and transmitted through the quadrupole analyser (entry point for MS2), which is now used as an analyser to transmit solely the ions of interest into the collision cell which lies in-between the first and second analysers. An inert gas such as argon is introduced into the collision cell and the sample ions are bombarded by the collision gas molecules which cause them to fragment. The optimum collision cell conditions vary from peptide to peptide and must be optimised for each one.

The fragment (or daughter or product) ions are then analysed by the second (time-of-flight) analyser. In this way an MS/MS spectrum is produced showing all the fragment ions that arise directly from the chosen parent or precursor ions for a given peptide component.

An MS/MS daughter (or fragment, or product) ion spectrum is produced for each of the components identified in the proteolytic digest. Varying amounts of sequence information can be predicted from each fragmentation spectrum, and the spectra need to be interpreted carefully. Some of the processing can be automated, but in general the processing and interpretation of spectra will take longer than the data acquisition if accurate and reliable data are to be generated.

The amount of sequence information generated will vary from one peptide to another, Some peptide sequences will be confirmed totally, other may produce a partial sequence of, say, 4 or 5 amino acid residues. Often sequence "tag" of 4 or 5 residues is sufficient to search a protein database and confirm the identity of the protein.

[Source: http://www.astbury.leeds.ac.uk/facil/MStut/mstutorial.html]

Putting it all together

big_picture.png

This image summarises a Proteomics experiment. We ionise the sample using a technique such as MALDI and feed into MS1, where we get primary peaks. This is garbage-in-garbage-out step, so we need to be careful about the experiment design. Labelling can be done to distinguish case versus control.

Another important point to remember is that proteomic experiment during MS-MS is robust of noises. In the beginning, because we already labelled the samples, we mix everything together. Thus if we introduce any kind of error in subsequent steps, both cases and control will together be effected by that error. Thus, we don't need to worry about biases.

The end result from MS-MS is peaks. The intensities of these peaks can help in quantifying the changes in protein expression during case versus control studies.We can use softwares such as MaxQuant, Python and R scripts to do the data analysis part.

Pipeline and Various caliberations in MS-MS instrument

Here is an overview of a proteomics pipeline.

pipeline.png

Peak identification and FDR control

peptide_identification.png

We have a theoretical spectra, which contains all possible peptides or all peptides registered by the machine during an experiment.

To understand FDR in context of MS-MS we need to understand the concept of decoy peptides.

(https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2922680/)

Decoy peptides are artificially manufactured sequences, the idea is that we search-engine can consider these decoy peptides along with the real peptides. Incorrect decoy sequences should be similar to correct one's i.e. they should have a similar distribution. If we know the proportion of decoy sequences versus target sequences, we can estimate the total number of incorrect targets.

Thus, decoy hits guide a sensitive filtering criteria to identify Peptide Spectrum Matches.

Steps:

*Evaluate the relative proportion of target and decoy sequences in the search space to derive the multiplicative factor required to estimate false positives, if necessary.

*Estimate false positive-related statistics.

*Use decoy hits to guide the establishment of filtering criteria.

*Report statistics for filtered data set.

In this image we can see sorted by mass; decoy and target sequences.

decoy_1.png

This image describes the distribution of decoy sequences with respect to target sequences for different lengths (L) for decoy sequences. Following equations govern these curves for decoy peptides.

eq_decoy.png

decoy_2.png

Ideal decoy sequences should have the following characteristics:

  • Similar amino acid distributions as target protein sequences.

  • Similar protein length distribution as target protein sequence list.

  • Similar numbers of proteins as target protein list.

  • Similar numbers of predicted peptides as target protein list.

  • No predicted peptides in common between target and decoy sequence lists.


During MS-MS experiments various kinds of instrument behaviour have to be understood. For example, as the column ages, so does the behaviour of retention time. This change has to be calibrated.

All these calibrations are done within MaxQuant software itself. Here is the documentation page for MaxQuant: http://www.coxdocs.org/doku.php?id=maxquant:common:download_and_installation

Here is the link to 2018, Barcelona Maxquant workshop:

https://www.youtube.com/playlist?list=PL6yHRLjecpwB0C0xdEhxiXLV1mXgj4bEO

Retention time calibration

retention_calibrartion.png

Mass calibration

The machine during MS1 is highly sensitive to environmental perturbations. In this image we can see fluctuation in mass with respect to retention time per minute.

calib_mass.png

calib_mass%292.png


Imagine that we have two proteins to separate during MS2 i.e. instead of standard single protein, we have two proteins.

One way to separate them is by subtracting spectrum of one protein from another.

pep_1.png

So firstly, we identify all intensities associated with first peptide and then we subtract that, and identify intensities associated with the second peptide.


During peak identification there is one more problem we must address.

Sometimes the two overlapping peaks from different experiments are interpreted as a single peak. As can be seen in this image.

feature_detection.png

where,

• Moz tolerance : maximum delta (some variable) mass to match features between runs

• Time tolerance : maximum retention time difference between 2 runs

Case studies

This discussion about proteomics is senseless without doing some case studies. We start with the basic protocol, and how they can be applied to more sophisticated case studies.

MS1 SILAC labelling

ms_1_labelling_silac.png

We are performing a control versus case experiment. In this case case is mouse fed on normal diet i.e. Arg0, Lys0. While control have healthy mice fed with Arg10, Lys8 heavy isotopes (the numbers are the additional neutrons).

We take tissue samples, and give it to MALDI instrument. A laser will scan through these samples and feed the ions to q-Exactive (MS instrument). Thus, we will obtain a spectra of peptides

In the resulting spectra we can see that Control::Case for protein Xa is 1:1, this means that there was no change in level of expression of this protein.

While in Control::Case for Xb we can see a ratio 1:3 this means that Xb expression was decreased in the condition positive tissue (case).

Thus, using SILAC labelling we can get a snapshot of the entire proteome for Control versus Case.

Unlabeled MS1

What if we forgot to put labels on our SILAC samples?

silac_2.png

Imagine that we are working with 5 breast cancer cell lines, and we mixed the cell lines together such that for each cell line we have 200 cells. In total we have 1000 cells where each cell line is 1/5th of the total cells.

When we do the proteome analysis we find out that there is a surprising similarity in proteomes. However, for different cell lines there is significant difference in expression of a number of proteins.

This variability is usually similar for low and high abundance proteins, and even many of the most highly expressed proteins with housekeeping (super important) proteins. So basically, now we have representative ratios for signature proteins for particular cell lines, and we already know the proportion of each cell line.

Now imagine, that instead of 5 cancer cell lines, we have 100's of cell lines, and we create database of such cell lines, with representative ratios.

In general if someone gave an unknown sample such a database can be used to identify the unknown sample.

This experiment is called super-SILAC(https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3316730/).

NeuCode (Neutron Encoded SILAC)

neu_encode.png

Orbitrap is incredibly sensitive. To the degree that addition of single neutron will be reflected in the spectra.

neu_encode_2.png

Multiplexing: We can see that as opposed to NGS sequencing, proteomic experiment has at best 3 channels.

By varying the number of neutrons for a particular amino acid in different cell lines, we can increase the number of channels up to 7. As can be seen in the image above, there is 17.6 mDa difference between channels.

MS2 labelling

TMT_1.png

TMT (Tandem Mass Tag) reagent is used to do MS2 labelling. {Alternatively, isobaric tags iTRAQs are also used, the idea is the same.} Fragmentation usually happens at NH-CH, CH-CO, and CO-NH.

Imagine that we are studying a protein X. We use TMT to label different positions on this protein. TMT reagents can be used to simultaneously analyze 2 to 11 different peptide samples prepared from cells, tissues or biological fluids.

TMT2.png

When fragmentation will take place, we have a unique reporter ions for each unique position labelled by TMT.

This type of experiment is particularly useful for providing structural information concerning small organic molecules and for generating peptide sequence information.


We need to remember that:

The first MS filters for the precursor ion followed by a fragmentation of the precursor ion with high energy and e.g. Argon gas. A second mass analyzer is then filtering for the product ions, generated by the fragmentation. The advantages of MS/MS is the increased sensitivity (due to reduction of noise) and you can gain more structural information on your analyte (QTOF {quadruple time of flight}) based on the fragmentation pattern.

Imagine that we are studying post translational modifications. The same protein can be glycosylated (attachment of glycogen), sumoylated (attach of sumyl group)... and so on. Using MS2 we can study these!

Ratio compression problem

ratio_compression.png

What if we took TMT tagged Yeast protiens across 6 channels, and contaminated some channels with human cell proteins in equal ratio.

Here human protein will form precursor ions during fragmentation, which will be isolated alongside the fragments from the original sample (some kind of impurity).

This is a limiting factor of identification rates in shotgun proteomics. The isolated tags from precursors when superimposed on the reporter tags from the selected precursor ion, gives an inaccurate link between peptide quantity and identity.

Because typically most of the proteins in a biological sample are unregulated, the co-isolated peptides often create reporter tags with equal relative intensity.

Lable free proteomics

label_free_pro.png

Imagine, we want to do an experiment, only that we don't want to attach any lables to the samples. In this case we let the algorithm do normalisation over the samples to deconvulated two entire samples!

This is the cheapest method, is highly dynamical (registers maximum changes), and gives the deepest coverage of the proteome. But the limitation is that there is a risk of systematic error due to parallel sample processing. Also since we don't have labels, we basically only have one channel.

This image shows the performance of deconvulation of two samples by three different algorithms. We can see that MaxQuant's lable free deconvulation works best.

approaches_lable_free.png

Real life applications of proteomics

In this section some interesting stories were discussed by Segnor Pavel, about proteomics in his lab and other cool applications. Hold your breath!

Variation in proteome with circadian cycle

Experimental setup:

The experimenter knows the fixed circadian schedule for the mice. For mice circadian clock is maintained in lab conditions.

For each protein variation during each hour of the day was measured.

All this result is accumulated and fitting is done for sinusoidal function.

circada.png

This image shows the cyclic activity after compiling together the data from the study. Clearly, as can be seen, their is a cyclic activity in increment and decrement of expression with fluctuations of day and night.

cyclic.png

Comparing various Omics data

We are curious about how Genomics, Transcriptomics, and Proteomics correlate.

multi_omics_comparison.png

Here we see a comparison of protein abundances, ribosomal profiling data, and mRNA expression. Proteins are quantified with the iBAQ method, while RPKM was used for the other two data types.

When comparing the proteome and the transcriptome, the one-to-one correspondence between transcript and protein sequences holds true with only little deviations due to, for example, translation errors and post-processing of the protein sequence.

Upon mapping transcriptomic data to proteomic data we have to consider the splice variants etc. Similarly, for genomics and proteomics, where DNA copy number correlations, loss of heterozygosity etc. have to be considered in the model.

Ribosomal profiling: We know that ribosomes bind to a particular mRNA and and produce protein. If we somehow freeze this ribosome, we get two important pieces of information: a.) which transcript (mRNA) produced the given peptide b.) where in the genome this transcript came from

Ribosomal profiling data can be combined with proteomics and messenger RNA. Outliers in these groups may be interesting to study. It is difficult to assign significance to individual data. Thus, a special 2D annotation enrichment is used to understand which classes of gene products show concordant and discordant behaviour.

We have to remember that there are also post-translational regulation of genome. A subbranch of Epigenomics which involves microRNA-controlled inhibition of transcripts. microRNAs bind to mRNA and form double strands, which cannot be translated into proteins. This mechanism is similar to RNA-silencing (RNA interference) in plants.

Half life of proteins

multi_omics_2.png

Each protein has some half life i.e. the amount of time after which it will be degraded to half of it's initial concentration. We can use proteomics for a given protein of at different points in time up to a few days, under similar conditions, and study hundreds of half lives at the same time!

half_life.png

A surprising result form these studies is that, half-life of proteins and mRNA have 0.02 correlation!

half_life_3.png

Also that, protein are 1000 times more abundant than mRNAs!

Single cell proteomics

Using TMT in MS2 allows us to do sophisticated single cell proteomic analysis. The idea is that each cell fraction is tagged at a different location for the given protein of interest, and these can be de-convulated in MS-MS data analyses.

single_cell_proteomics.png

In this experiment 8 cell types were labeled with TMT and 200 cells were labeled separate. All these cells were then combined together and analysed.

If we take PC1 vs PC2 (principle component analysis) we can see that cell types take different spaces in the plot.

proteomics_2.png

However, no surprise here but single cell proteomics has the ratio compression problem.

ratio_comp.png

Single cell proteomics

Hands down, the coolest thing for me is the ability to study protein localisation, simultaneously, for thousands of proteins in a single experiment!

Locomics! (pun intended, in spanish loco means crazy)

protein_local.png

How does this work?

A powerful Biochemistry method is centrifugation. When we break down the cell, at different rates of rotation of centrifuge we have different organalles in the supernatant.

locomics.png

Imagine that in an experiment we took the control cells and at different rates of centrifugation did proteomics over the supernatant.

After treating the cells from the same tissue or colony with some hormone/ drug, we repeated the same experiment.

locomics_2.png

Now we have a huge matrix, with proteomes in the control versus condition for different organalles. If we do PCA analyses we can see that different proteins are distributed in a 2D space in different locations.

locomics_3.png

Imagine a protein A, which was found in the plasma membrane in control. After application of the drug/ hormone, it has now migrated into the Principle Component space of mitochondria. In other words, we can say that, this protein A migrated from plasma membrane to mitochondria. And in a single experiment we can do this for the entire proteome! LOCO!

This has been done for a number of cells, including neuronal cells in Map of the Cell project. You can play around with this for different conditions, cell types and proteins.

http://www.mapofthecell.org/

Cancer T-cells & Two Smoking Immunopeptides

lock_stock.png

Segnor Pavel has a good sense of humor. This is a pun for 'Lock Stock and Two Smoking Barrels' movie.


Our body has a mechanism of preventing autoimmunity, by using immunopeptides, it differentiates self from foreign.

lock_stock_1.png

Cancer cells exploit this mechanism by faking the self antigens. The idea is that if we can train the immune system to react to a certain type of peptide, which is crucial for the survival of cancer cells. We can target the body's autoimmune response to clear out cancer cells.

The reason we target certain peptides is that, when a somatic mutation occurs, the peptide also mutates in some cases. Thus this mutated peptide can generate auto-immune response. But immune cells suppress autoimmune behavior because during our life time the cells accumulate many somatic muations, and if the immune system kept recognising every single one of them as foreign, we are doomed!

In this study the authors seleced melanoma cells. Because melanoma has the largest number of somatic muations.

melanoma_lock_stock.png

If we look at the graphs between predicted affinity and Rank of the gene in terms of importance of function. We can see that SYTL4 gene (our gene of intrest) is on 18th position. Which means that it is an important and distinct somatic mutation in the given cell line.

lock_stock_final.png

Next, they approached the problem in two ways: multi-allelic and single-allelic and repeated the experiment for various other cell types such as Hela cells.

single_HLA-loco.png

One flew over the immunopeptidomics

one_flew_0.png

Pun: the title is similar to 'One flew over the cuckoo's nest' movie.

Enter Quantitave Immunopeptidomics.

cookoo_1.png

In drug design, we study constant of dissociation (K_d) for each drug.

https://en.wikipedia.org/wiki/Dissociation_constant

Kinetically, many biological proteins and enzymes can possess more than one binding site. Usually, when a ligand L binds with a macromolecule M, it can influence binding kinetics of other ligands L binding to the macromolecule. Having multiple binding sites means that the drug is less specific.

This is an image of the kinome for a given drug. By kinome we refer to all possible interactions for a given drug. Usually when drugs interact they effect enzymes called Kinases (which drive a biochemical reaction in the cell). Having many interactions means the drug is less specific, and even though the drug functions, it has too many side-effects.

kinome_1.png

By targeting specific kinases we can get rid of all the side-effects and get one to one drug-effect relationship. Thus, we are usually interested in what regions of the proteins can interact.


At this point, we are told a story. It goes something like this:

Researchers modified one of the stop codon to call an unnatural amino-acid. However, during translation, after the whole peptides were synthesized, as soon as this unnatural amino acid was called by the modified stop codon; the synthesis froze. Basically they had ribosome with completely synthesied natural protein with the N-terminal unusual amino acid and mRNA.

At this point these ribosomes can be isolated along with mRNA and protein, and Genomics, Translatomics, and Proteomics can be done. Transcriptomics could thus be accessed with precise control.

Sources: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0000972


Moral of the story, we can combine the two approaches and get specific targets for a given condition.

Sidenote: X-linking experiment

This story reminds of a similar experiment called X-linking.

Crosslinking is the process of chemically joining two or more molecules by a covalent bond. Modification involves attaching or cleaving chemical groups to alter the solubility or other properties of the original molecule.

These are used as a validation method for protein structures.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4172966/

Protein-protein interactions (Interactome)

Background story: Y2H interactome

An important method for doing interactome study is by performing Y2H (Yeast 2 Hybrid) experiments.

https://www.singerinstruments.com/application/yeast-2-hybrid/\

The yeast-2-hybrid system is a simple scientific technique used to screen a library of proteins for potential interactions

1.) Firstly, a transcription factor is broken into two parts – a DNA-binding domain (BD) and a catalytic activation domain (AD) 2.) The DNA-binding domain is fused to a protein of interest called the bait (e.g. an enzyme) 3.) The activation domain is fused to a number of potential binding partners – called the prey (e.g. different ligands) 4.) If the bait and prey interact, the two parts of the transcription factor are reconstituted and activate transcription of a gene 5.) If the bait and prey do not interact, the two parts of the transcription factor remain separate and transcription doesn’t occur

Thousands of interactions have been detected and categorized in databases as BioGRID.

Time complexity for this method is n^2 i.e. for n interactions we need to perform n^2 experiments.


prot_prot.png

Instead of using labels to tag, GFP (green fluroscent protein) tagging is alternative to label-free MS-MS. The resulting label-free quantification (LFQ) intensity matrix is the basis for all downstream data analysis aimed at identifying interactors of the tagged bait proteins.

GFP-MS-MS is also called AE-MS (affinity enrichment MS).

prot_prot_2.png

Here we the entire protein background is labeled with GFP, and MS-MS is performed. Up to 2000 background binders are obtained in a single pull-down (MS1). The background helps in noise reduction (normalization). Interacting proteins are not identified by comparing with a single untagged control, but to all other tagged strains.

Finally, we validate the potential interactors by their intensity profiles across all samples.

AE-MS (or in general label-free MS) is not only highly efficient and robust, but also cost effective, broadly applicable, and can be performed in any laboratory with access to high-resolution mass spectrometers.

Time complexity for this method is n i.e. for n interactions we need to perform n experiments!


prot_prot_3.png

Plasmids are engineered to produce a protein product in which the DNA-binding domain (BD) fragment is fused onto a protein while another plasmid is engineered to produce a protein product in which the activation domain (AD) fragment is fused onto another protein.

The protein fused to the BD may be referred to as the bait protein, and is typically a known protein the investigator is using to identify new binding partners. The protein fused to the AD may be referred to as the prey protein and can be either a single known protein or a library of known or unknown proteins.

In this context, a library may consist of a collection of protein-encoding sequences that represent all the proteins expressed in a particular organism or tissue, or may be generated by synthesising random DNA sequences. Regardless of the source, they are subsequently incorporated into the protein-encoding sequence of a plasmid, which is then transfected into the cells chosen for the screening method. This technique, when using a library, assumes that each cell is transfected with no more than a single plasmid and that, therefore, each cell ultimately expresses no more than a single member from the protein library.

If the bait and prey proteins interact (i.e., bind), then the AD and BD of the transcription factor are indirectly connected, bringing the AD in proximity to the transcription start site and transcription of reporter gene(s) can occur. If the two proteins do not interact, there is no transcription of the reporter gene. In this way, a successful interaction between the fused protein is linked to a change in the cell phenotype.

The challenge of separating cells that express proteins that happen to interact with their counterpart fusion proteins from those that do not, is addressed in the following section.


Thus interactomics used to construct large protein-protein interaction maps. As can be seen in this image.

prot_prot_4.png

Literature sources: GFP MS-MS https://academic.oup.com/pcp/article/53/4/617/1840745 Protein interaction networks https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4332878/ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4288248/

Cross-species interactome

cross_.png

In this study the researchers took 9 different species across various domains of life. They prepared 6387 fractions and did an LC-MS-MS (label free), standardised the results and mapped to human proteome. Finally, using computational methods and ML they clustered together high-confidence interactions into complexes. They found various conserved complexes throughout the proteome.

Saliva microbiome case study

In this experiment, the researchers collected saliva samples from various individuals and did a proteome analysis. Since they were taking saliva sample, each sample will have different microbiome strains

saliva_study.png

Furthermore they compared microbiomes from saliva to other parts of the body e.g. we can see that microbiome of mouth and stool take different principle component spaces, and are distant apart suggesting different makeup of undigested and completely digested food.

Similar studies are being performed in different labs around world.

saliva_microbiome.png

Plasma proteome profiling

In this final case study Mann lab focuses on effect of exercise on proteome. The participants were made to lose weight and proteome before and after weight loss was profiled.

weight_loss.png

Segnor Pavel, briefly discussed Post Translational Modifications, which are also major focus of case studies in proteiomics.

Note: I have sent a mail to Segnor Pavel, describing that Lecture 2 slides are missing. Let's see when he replies. Will add the notes later, I don't remember everything from L-02 but I've tried to fill in the gaps.